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ABSTRACT 

The analysis of pulsar timing data, especially in pulsar timing array (PTA) projects, 
has encountered practical difficulties: evaluating the likelihood and/or correlation- 
based statistics can become prohibitively computationally expensive for large datasets. 
In situations where a stochastic signal of interest has a power spectral density that 
dominates the noise in a limited bandwidth of the total frequency domain (e.g. the 
isotropic background of gravitational waves), a linear transformation exists that trans- 
forms the timing residuals to a basis in which virtually all the information about the 
stochastic signal of interest is contained in a small fraction of basis vectors. By only 
considering such a small subset of these "generalised residuals" , the dimensionality of 
the data analysis problem is greatly reduced, which can cause a large speedup in the 
evaluation of the likelihood: the ABC-method (Acceleration By Compression). The 
compression fidelity, calculable with crude estimates of the signal and noise, can be 
used to determine how far a dataset can be compressed without significant loss of 
information. Both direct tests on the likelihood, and Bayesian analysis of mock data, 
show that the signal can be recovered as well as with an analysis of uncompressed 
data. In the analysis of IPTA Mock Data Challenge datasets, speedups of a factor of 
three orders of magnitude are demonstrated. For realistic PTA datasets the accelera- 
tion may become greater than six orders of magnitude due to the low signal to noise 
ratio. 
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1 INTRODUCTION 

In the past several decades, pulsar timing has been suc- 
cessfully used to study a wide range of science. Past 
successes include the con firmation of gravitational waves 
jTavlor fc Weisberdll982l). a nd ve ry accurate tests of gen- 
eral relativity ( Kramer et al.|[2006l ). The interesting science 
of these examples stems from the fact that accurate mea- 
surements of the times of arrival (TOAs) of the radio pulses 
allow for a precise determination of the trajectory of the 
pulsar relative to the Earth. This is possible because the 
TOAs can be accurately accounted for by current models 
of the pulsar trajectory, pulse propagation, and pulsar spin 
evolution in relativistic gravity. 

Among on-going pulsar timing projects are Pul- 
sar Timing Arrays (PTAs), which are programmes de- 
signed to detect low-frequency (10~^ — 10~*Hz) extra- 
galactic gravitational-waves (GWs) directly, by using a 
set of Galact ic millisecond pulsars as nearly-perfect Ein- 
stein clocks ([Foster fc Backeil Il990l ) . GWs perturb space- 
time between the pulsars and the Earth, and this cre- 
ates detectable deviations from the strict periodicity in 
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the TOAs dEstabrook fc Wahlguis^ Il975l : ISazhinI Il978l : 
lDetweilerlll979l ). One of the main source candidates for 
PTAs is an isotropic stochastic background of gravita- 
tional waves (GWB), thought to be generated by a large 
number of massive black-hole binaries l ocated at the 
centres of galaxies (iBegelman et al.l [l980l: PhinnevI 1200 ll: 



Jaffe fc Backer! l2003l : IWvithe fc Loebl 



200^ . 



2003 



by relic gravitational-waves (jGrishchuk 



Sesana et al.l 
20051 ). 



or, more speculatively, by oscillating cosm i c-string loops 
JPainour fc Vilenkin|[2005l : lOlmez et al]|2010l : ISanidas et al] 
l2012h . 

The analysis of pulsar timing data, and even more 
so PTA data, can become prohibitively time-consuming 
for large datasets. This is especially true for Bayesian 
data analysis methods, l ike the analysis of PTA data 
(|van Haasteren et al.ll2009l . hereafter vHLML), and the cor- 
rection for dispersion measure variations (Lee et al., in 
prep.). Typically, the computational cost scales as n"^ or n^, 
with n the total number of observations; the computational 
difficulties will increase sharply over time. 

In this work, one possible solution for the computa- 
tional difficulties is explored in the case the signal of inter- 
est is a time-correlated stochastic signal: the ABC-method 
(Acceleration By Compression). The ABC-method is based 
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on lossy linear data compression. By significantly reduc- 
ing the dimensionality of the problem, the evaluation of 
computationally expensive quantities can be greatly acceler- 
ated. We specifically focus on the International PTA (IPTA) 
Mock Data Challenge (released by M. Keith, K. J. Lee, and 
F. A. JenelQ), in which the GWB is a good example of a 
compressible stochastic signal. 

The outline of the paper is as follows. In Section [2] we 
briefly review the relevant theory of pulsar timing observa- 
tions, with a special attention to the likelihood in the pres- 
ence of time-correlated stochastic signals. We introduce the 
ABC-method, and the compressibility of datasets, in Sec- 
tion [31 In Section U we look into some of the practicalities 
concerned with compression of PTA data, and investigate 
the computational demand of different terms in the evalua- 
tion of the likelihood. In that section, we provide and test a 
method based on cubic spline interpolation to estimate the 
compressed covariance matrix. This causes an extra speedup 
of a few orders of magnitude. Finally we present our conclu- 
sions in Section [5l 



2 PTA DATA ANALYSIS 

The typical data processing pipeline for pulsar timing ob- 
servations processes the raw baseband data in several data 
reduction steps, where at each data reduction step the data 
volume is drastically reduced. The data reduction steps con- 
dense the scientifically interesting information into a signifi- 
cantly smaller number of data points, sometimes mitigating 
noise in the process. At the end of the pipeline we are left 
with TOAs. 

This work proposes a method to compress the TOA 
data even further to what we call generalised residuals. The 
data compression is based on the likelihood for the TOAs 
and the Fisher information, with information preserved only 
for a specific stochastic signal. To this end, we review the 
theory of TOAs, the likelihood, and inclusion of the timing- 
model in this section. 



2.1 The likelihood 

We consider k pulsars, with TOAs for the a-th pulsar, 
where the n' = TOAs are described as an addition 

of a deterministic and a stochastic part. In the observations 
this distinction is blurred because we cannot fully separate 
the stochastic contributions from the deterministic contri- 
butions. In practice we therefore work with timing residuals 
that are produced using first estimates /3oi of the m timing- 
model parameters jSi {i between 1 and m); this initial guess 
is usually assumed to be accurate e nough to use a linea r 
approximation of the timing-model (jEdwards et al.|[2006h . 
Here m = Yla=i sum of the number of timing- 

model parameters of all the individual pulsars. In this linear 
approximation, the timing-residuals depend on (,i = Pi — Poi 



->/ ^prf ^ 

St =St +M(„ 



(1) 



where 5t are the timing-residuals in the linear approxi- 
mation to the timing-model, 5lF is the vector of pre-fit 
timing-residuals, ^ is the vector with timing-model param- 
eters for all k pulsars, and the (n' x m) matrix M is the 
so-called design matrix (see e.g. §15.4 of IPress et al.|[r992l . 
vHLML), which describes how the timing-residuals depend 
on the model parameters. As an example, for a simple timing 
model which only contains quadratic spindown, the matrix 
M is a in' x 3) matrix, with the j'-th column describing a 
(j — l)-th order polynomial. The elements of M are then: 



with ti the i-th TOA. 



Identical to vHLML and Ivan Haasteren fc LevinI (|2012l . 
hereafter vHL), we model the stochastic contributions to the 
TOAs as a time-correlated stochastic signal, described by a 
random Gaussian process. The corresponding likelihood is 
equal to: 



Pi5t 



exp 



-1 {st - M^j^ C'-^ {st - utj 



\/(27r)"' det C" 

(2) 

where (f> is the vector describing all the stochastic model pa- 
rameters, and C' — C {(j)) is the covariance matrix of the 
sum of all stochastic signals. This includes the measurement 
uncertainties, the timing noise (red spin noise), and a pos- 
sible GWB. 



2.2 Marginalising over the timing-model 

Using Equation ([2]) is computationally not very efficient be- 
cause of the large number of timing model parameters. How- 
ever, in the case of uniform priors (vHLML) and Gaussian 
priors (vHL) it is possible to analytically marginalise the 
posterior distribution over the timing model parameters. In 
the remainder of this work we assume no prior information 
about the timing model parameters, and use uniform priors. 

In their search for a simplified representation of the ana- 
lytic marginalisation procedure, vHL decomposed the design 
matrix into an orthogonal basis based on the singular value 
decomposition M — VEV*, where U and V are (n' x n') 
and (m x m) orthogonal matrices, and E is an (n' x m) di- 
agonal matrix. The first m columns of U span the column 
space of M, and the last n = n' — m columns of U span the 
complement. We denote these two subspace bases as F and 
G respectively: U = {F G) . In Section FS.ll we show that 
G is actually a lossless data compression matrix. 

Now, integrating over ^, our marginalised likelihood be- 
comes (vHLML): 



det {FT C'-^F)-^ 
— — — X 
^(27r)" det C 

f 1 _i 

exp I —-^51 Cp St j , 



with: 

Cp^ = C''^ -C'^^F (f'^C'^^F 
Cp GG C GG , 



(3) 



(4) 



|http: //www. Ipta4g«. org/?page_ld=214| 



where the singular matrix Cp^ is the inverse of Cp in the 
non-singular subspace of its basis. The singular matrix ma- 
trix Cp is the post-fit covariance matrix of the timing- 
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residuals (vHL; iDemorest et al.ll2012l , hereafter D12). D12 
use a pseudo-inverse based on a singular value decomposi- 
tion of C' to evaluate Cp^ in their evaluation of a GWB 
detection statistic; this is equivalent to marginalising over 
the timing model parameters (vHL). 



3 THE ABC-METHOD 

Data compression is the encoding of information in a smaller 
data volume than the original information data volume. This 
can be done without losin g information (lossle ss), or with 
losing information (lossy) (|Wade fc Wadelfl994 ). We would 
like to use data compression to reduce our data volume, with 
the aim of speeding up the computations that are necessary 
for the analysis of PTA data. In this work we compress the 
data in such a way to retain the sensitivity to one stochastic 
signal (e.g. the isotropic background of gravitational waves): 
the "ABC-method" (Acceleration By Compression). 

In Section 13.11 we show that marginalisation over the 
timing model parameters is equivalent to lossless data com- 
pression. In Section 13.21 we expand the data compression 
formalism, and show how to construct a basis in which sen- 
sitivity to a particular signal is retained. We define the cor- 
responding compression fidelity in Section 13.31 Finally, in 
Section 13.41 and Section 13.51 we discuss how to interpret 
the compressed basis of generalised residuals, and how far a 
dataset can be compressed without significant loss of infor- 
mation. 



3.1 Marginalisation = lossless data compression 

vHL showed that Equation ^ can be rewritten as: 

exp [-^St'^G {C^C'Cy' G^St' 



I 



^(27r)"det (G^C'G) 



(5) 

with notation as in Section 12.21 This is an unmarginalised 
likelihood of a random Gaussian process in n dimensions, 
with data St — G^St and covariance matrix C = G^C'G. 
The dimensionality of the data is reduced from n' to n due 
to the marginalisation process. From here onwards, we start 
the convention that a prime superscript denotes that a vec- 
tor or a covariance matrix lies in in the larger unmarginalised 
space, whereas no prime denotes that either of them lies in 
the marginalised space. The vector St contains all the infor- 
mation about all stochastic signals: marginalisation over the 
timing model parameters is the same as lossless linear data 
compression in this formalism. The matrix G is our linear 
data compression matrix, and 5t is our vector of reduced 
data. 



3.2 Lossy linear data compression 

We would like to compress the reduced data St even further, 
without losing too much information about the stochastic 
signal of our interest. We expect this to be possible, since 
usually the signal and the noise differ in power spectral den- 
sity. Only some parts of the spectrum are dominated by the 
signal; other parts are dominated by the noise. The data 
compression scheme in this work is based on throwing away 



the parts of the data that are dominated by the noise by 
using linear data compression: x = St, with x the com- 
pressed data, or "generalised residuals" as we will call them, 
and H the compression matrix. Here the number of columns 
of H is less than the number of rows, where we define the 
compression to be the total number of timing residuals di- 
vided by the number of compressed generalised timing resid- 
uals. We derive one possible scheme to construct a suitable 
H in this section. 

In order to determine how much information about our 
signal of interest is in our data, we use the Fisher informa- 
tion. We acknowledge that formally the Fisher information 
does not completely quantify how well a parameter can be 
confined with a specific datas et, especially in the case of a 
low signal-to-noise ratio (e.g. I Vallisneril |2008| ) . but in this 
exploratory work we consider the Fisher information as a 
sufficient first attempt. Denoting the log- likelihood of Equa- 
tion (O as A, we find for the Fisher information: 



2^'U^ d<t> 



(6) 



where Xe^ is the Fisher information, and 4> and 9 are model 
parameters that affect the signal power spectral density. 
Suppose that the stochastic processes in the reduced data St 
are described by the covariance matrix C = E -f a S, where 
E is the covariance matrix of the noise, and S is the co- 
variance matrix of the signal of interest with amplitude . 
We would like to know which basis vectors have the largest 
contribution to the Fisher information, which would be eas- 
iest to determine if we could completely diagonalise the ma- 
trices in the trace of Equation [G] This is possible with a 
non-orthogonal transformation. Even though the inner prod- 
uct is not preserved in such a transformation, the trace re- 
mains invariant. We use a square root of the noise matrix, 
TiZi^^'^ — E^^'''^, to do this. For the moment we assume that 
this estimate of Eu, is indeed correct, but in Section [3.41 we 
argue that an inaccurate noise estimate still results in a us- 
able compression. In this new basis, the whitened data and 
covariance become St'" = T,^^^^St and C" — 'Sw^^^C'Sw^^^ . 
The maximum sensitivity based on the Fisher information 
now has a simple form: 



Var(a) 



5? a Taa 



(l + aAO^ 



(7) 



where Xi is the i-th eigenvalue of C". The aXi should be 
interpreted as signal to noise ratios. We can only evaluate 
Equation ([7]) if we have complete knowledge of the signal S, 
the signal amplitude a, and the noise E. However, the Xi and 
the corresponding basis vectors do not depend on a, which 
means we can examine the sensitivity to a as a function of 
the number of Xi we include. Here, we do assume knowledge 
of E and 5*. 

In the limit where a is large, the strong signal limit, 
we can neglect the one in the denominator of the sum of 
Equation ([7|, which makes all terms in the sum equal. This 
means that all generalised residuals carry equal information 
as is expected in such a case: the noise is negligible com- 
pared to the signal, so no parts of the signal are buried 
under the noise. In the strong signal limit, data compres- 
sion is therefore not possible. Note that the sensitivity is 
then proportional to the number of generalised residuals, as 
it should. 
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In the limit that a is small, the low-signal limit, we can 
neglect all terms aXi in the denominator of the sum, making 
the sensitivity equal to the sum of all a^Xi. The distribution 
of values of the Xi eigenvalues is determined by the power 
spectral density of the signal compared to the noise. If the 
signal spectrum is the same as the noise spectrum, all the A; 
will be identical. However, if the signal has a different spec- 
trum than the noise, the A; can span a wide range of values, 
where the large Xi correspond to basis vectors where the 
signal is relatively large compared to the noise. In this case 
there are nearly-redundant data points, and compression is 
possible. 

3.3 The compression fidelity 

We define the fidelity ^ G [0, 1] to be the fraction of the total 
sensitivity we retain in our compressed data. We choose the 
number of generalised residuals that we keep, I, to be the 
smallest number such that: 



(8) 



where we have ordered the Xi to have the largest values for 
the lowest indices. We typically work with ^ ^ 0.99, which 
in favourable cases like the IPTA Mock Data Challenge al- 
lows for compressions greater than 10: less than 10% of the 
original data volume is kept. 

Computationally, we suggest to use a singular value de- 
composition to produce the eigenvalues and eigenvectors of 
C"', where our fidelity criterion. Equation ([8]), keeps only I 
of the n generalised residuals 5tY ■ We construct the (n x /) 
matrix W as consisting of the columns of the I eigenvectors 
that belong to the selected eigenvalues. The data compres- 
sion matrix is now: H = Hj^^'^W . Using Equation ([Sjl, we 
now find for the likelihood of the compressed data x — H^6t: 

exp [-if^ {H'^CHy^xj 



P{x\<t>) 



(9) 



^{2Tvy det (E„) det (RTCH) ' 

where the extra determinant of E^j comes from the whiten- 
ing, and can be ignored in practice as it is absorbed in the 
overall normalisation constant. This equation is the basis of 
the ABC-method, as the computationally expensive inver- 
sion has been replaced with a lower-dimensional one. 

Equation (O is completely gene ral, and can readily 
be ap plied to realistic datasets. As in Ivan Haasteren et al.l 
(|201ll ). all timing-model parameters and jumps can be in- 
cluded in the likelihood, and are therefore by design part 
of the data compression scheme. We therefore expect not to 
encounter any difficulties in applying data compression to 
realistic data sets, even though in this work we only test the 
effectiveness on the Mock Data Challenge. 

3.4 Interpreting tiie compressed basis 



As we have discussed in Section r3.ll marginalising over the 
timing model is the same as linear data compressing to the 
subspace of the original data St orthogonal to the columns 
of the design matrix M. Similarly, the data compression we 
suggest in Section lX^ is equivalent to marginalising over vec- 
tors that lie in the subspace orthogonal to the column space 
of H with uniform priors. By considering the data in the 



basis orthogonal to the column space of H to be nuisance 
parameters with uniform priors, the resulting likelihood of 
the compressed generalised residuals becomes independent 
of the value of the data in the orthogonal complement (we 
have not found another prior with the same property) . This 
interpretation of data compression in terms of marginalisa- 
tion assures us that we are not introducing any biases or 
unwanted systematics in our analysis. The difference with 
the marginalisation over the timing model is that we do 
not marginalise over physical nuisance parameters; we are 
throwing away information. The data compression matrix 
H as constructed in Section 13.21 guarantees that we throw 
away as little information about the signal amplitude a as 
possible. This is not true for the other parameters this signal 
may also depend on: optimal sensitivity to those parameters 
possibly requires a different basis, construction of which is 
subject of ongoing follow-up research. We ignore this issue 
in the rest of this exploratory work, and assume that sensi- 
tivity to the signal amplitude is sufficient for our purposes. 

The interpretation of data compression in terms of 
marginalising over non-physical parameters assures us that 
the likelihood of the compressed data in Equation ^ is also 
valid if we do not provide good estimators for Euj and S. We 
may be throwing away more information than we thought 
if our estimators are not accurate, but we do not introduce 
any bias or systematics in our likelihood. It is therefore not 
imperative to be thorough in the estimates of the signal and 
the noise; a reasonable guess may be sufficient for practical 
purposes. 

It is instructive to inspect the compressed basis vectors 
GH for highly compressible signals. We choose the IPTA 
Mock Data Challenge as an example, since the GWB sig- 
nal strongly dominates the noise at the lowest frequencies 
in these datasets. In Figure [1] we present the first three 
compressed basis vectors for J0030+0451 and J0437-4715 
of Mock Data Challenge open 1. We observe that, roughly, 
the first basis vector corresponds to a third-order polyno- 
mial: start negative, then ascend to a maximum, descend 
to a minimum, and finally end positive. The other two ba- 
sis vectors display a similar behaviour with the order of the 
polynomial equal to the order of the basis vector +2. Note 
that the zeroth, first, and second order are missing due to 
the removal of quadratics in our marginalisation over the 
timing model parameters. 

We note that the compressed basis vectors for both 
pulsars in Figure [1] are similar, except that those of J0437- 
4715 display more high-frequency behaviour. This is because 
J0437-4715 resides in a binary, and the timing-model there- 
fore includes parameters for binary motion. 



3.5 Compressibility: how far can we go? 

A natural question that arises in data compression is how 
much we can compress the data without losing a significant 
amount of information. To answer this question, we consider 
the fidelity as a function of the number of generalised resid- 
uals in the dataset. For compressible datasets we expect the 
fidelity to stay close to one, only to drop for high compres- 
sion rates. One possible measure of compressibility, which 
we use in our application of the ABC-method, is the max- 
imum compression for which the fidelity stays above 0.99. 
This maximum compression depends on the signal ampli- 
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Figure 1. The first three compressed basis vectors of J0030+0451 
and J0437-4715 of IPTA Mock Data Challenge open 1. These 
basis vectors are the the first three columns of the matrix GH. 
The basis vectors are normalised, so we have ignored the scaling 
on the y-axis. The basis vectors of J0437-4715 have more high- 
frequency structure due to the fact that J0437-4715 is in a binary. 

tude and power spectral density compared to that of the 
noise. 

As an example, we plot the fidelity of the mock data of 
J0030-I-0451 from the open Mock Data Challenge versus the 
compression in Figure [21 where the signal of interest is the 
gravitational-wave background. In these datasets the noise 
is white. Open dataset 3 does formally contain some extra 
(mildly) red noise which we do include in these plots, but 
the level of red noise is so low that it is negligible in practice. 
Because the signal is of such a different spectral shape than 
the noise, data compression is very efficient. In such a case, 
the higher the noise level compared to the signal, the more 
compressible the dataset is. 

In Figure[2]we plot the fidelity for open dataset 3 in the 
low-signal limit (LSA). Data compression is most efficient in 
the low-signal limit. As a comparison we show the fidelity 
for an incompressible signal in Figure [2] as well. This corre- 
sponds to the high-signal limit. We see that an increase in 
the compression results in an equal decrease in the fidelity. 

3.6 Compressing realistic datasets: a prescription 

For realistic datasets we generally do not know the details of 
the signal and the noise. The noise typically has to be char- 
acterised from the data, and we may not even be certain 
of the presence of a signal of interest. Since the fidelity de- 
pends on estimates of the signal and the noise, it is not clear 
how far exactly we can compress the dataset without losing 
information from the signal. Here, we therefore recommend 
a conservative approach when preparing the ABC-method. 

The a\i in the denominator of Equation ((SJ represents 
the signal relative to the noise. The larger it is relative to 
1, the less likely we will discard that generalised residual. 
Therefore, if we are sure not to overestimate the noise, and 
if we are sure not to underestimate the signal, the com- 
pression fidelity will not be overestimated. Specifically, we 
recommend to calculate the fidelity as follows: 



Figure 2. The fidelity of the GWB signal as a function of the 
compression for the pulsar J0030-I-0451 of all IPTA open data 
challenge sets. Open dataset 3 contains more redundant informa- 
tion than the other two sets because the GWB is smaller, and 
the GWB signal is therefore buried under the noise at a larger 
portion of the spectrum. Open dataset 2 contains more redun- 
dant information than open dataset 1 because the errorbars for 
J0030-I-0451 observations were set higher in open dataset 2 than 
in open dataset 1. We have also plotted the fidelity of the low- 
signal approximation (open 3 LSA) for open dataset 3, and the 
fidelity of an incompressible signal (incompressible). 



1) Construct the noise covariance estimate such that it 
only consists of the TOA uncertainties. 

2) Choose a suitable spectral form for the signal of inter- 
est. For example: this consists of fixing the spectral index 
7 = 13/3 for the GWB. 

3) Use the estimates of vH L (Equation (22) & (24) of 
Ivan Haasteren fc Levinll2012l l to estimate the signal ampli- 
tude. For a GWB signal, this is: 

aGWB = L37xlO-«(^^)(^)\ (10) 

where T is the duration of the experiment. Ah is the dimen- 
sionless GWB amplitude, and ctgwb is the rms residual due 
to the GWB in the data. For other power spectral densities 
a similar calculation to vHL is required. 

By completely ignoring other effects like red spin noise 
in these estimates, we are ensured that we do not throw 
away more information than we should. Indeed, more noise 
in this calculation would mean a higher compression. This 
conservative approach is therefore also guaranteed to work 
in the presence of (strong) red noise. 

We note that this approach can overestimate the fidelity 
if the TOA uncertainties have been overestimated, or when 
the shape of the signal power spectral density has been es- 
timated incorrectly with, for instance, an incorrect spectral 
index. The TOA uncertainties depend on complex details 
of the data reduction pipeline prior to the formation of the 
TOAs and of the cross-c orrelation of the pulse profile with 
a template (jTavloij|l993 ). However, underestimation of the 
TOA uncertainty is uncommon in practice. How to choose a 
suitable basis to be sensitive to the spectral index is a sub- 
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ject of ongoing follow-up research. Here we assume we know 
the spectral index of the signal of interest. 

In the open Mock Data Challenge, shown in Figure [21 
high compressions of over 50 still yield a fidelity close to ^ = 
1 in the low-signal limit. Since realistic datasets are expected 
to be in the low-signal limit - we have not detected a GWB 
yet - we expect high compressions in realistic datasets to 
be possible as well. However, realistic datasets can have far 
more TOAs per pulsar than the 130 TOAs per pulsar in 
the Mock Data Challenge. Since in the low-signal limit only 
a few generalised residuals per pulsar is enough to reach 
^ ^ 0.99, we expect very high compressions, possibly up to 
c = 1000 depending on the size of the dataset, to be realistic 
for initial PTA applications. 



4 LINEAR DATA COMPRESSION IN 
PRACTICE 

Although the raw data of pulsar observations can be quite 
voluminous, the pulsar time of arrival data files are typically 
several kilobytes in size. Because it seems quite unlikely that 
data volume at this stage of the analysis is ever going to be 
a problem, the only reason to resort to data compression is 
because it can greatly accelerate the analysis of pulsar tim- 
ing data. In this section, we discuss the computational costs 
of evaluating the likelihood function with the ABC-method, 
and we present some computational shortcuts. A straight- 
forward application is a Bayesian analysis (e.g. vHLML), 
but other analysis methods described in the time domain 
are expected to see an equally large acceleration (e.g. D12). 
Special attention is given to power-law signals, for which 
we present a convenient approximation of the compressed 
covariance matrix, thereby maximising the effectiveness of 
data compression. 

4.1 Computational demand 

The computational demand of Equation ((Sj) scales as 
(vHL) due to the inversion operation of an (n x n) matrix. 
With linear data compression, we have decreased the size 
of the inversion matrix, which will therefore also decrease 
the computational demands. The computational demand of 
the inversion in Equation ((9]) scales as l^. Depending on the 
compression, this I'' operation may or may not be the com- 
putational bottleneck. For large enough compression factors, 
the computational bottleneck will either be the computation 
of C (n^ operation), or the multiplication H^CH {In^ oper- 
ation). In the case of an array of pulsars, the matrix H will 
be block-diagonal if the data compression has been done per 
individual pulsar. Then, the computation of H^CH can be 
accelerated with a factor of the number of pulsars by block- 
wise multiplication (vHL). 

In this assessment of computational demand, we have 
neglected the construction of the data compression matrix 
H. A computationally expensive singular value decomposi- 
tion of a full covariance matrix is required for this. How- 
ever, this only needs to be done once: we do not change the 
compressed basis during subsequent likelihood evaluations, 
even if we vary the noise/signal parameters during a Markov 
Chain Monte Carlo simulation. Since the compressed basis 
can be calculated for each pulsar individually, we therefore 



do not expect the construction the data compression ma- 
trix _ff to be a computational bottleneck in the foreseeable 
future. 

4.2 Testing the acceleration of the likelihood 

We test the performance of the ABC-method on the IPTA 
Mock Data Challenge: all challenges consist of 130 observa- 
tions per pulsar, with 36 pulsars. Our likelihood contains the 
following determ inistic and stochas tic signal contributions: 

1) the Tempo2 jHobbs et all 120061 ') timing-model parame- 
ters 

2) error bars for every TOA 

3) power-law red timing noise for every pulsar 

4) a correlated GWB 

Evaluation of the likelihood of Equation ((3| took on average 
38.3 second^ where most of that time comes from inverting 
the full covariance matrix. 

We compare the efficiency of Equation ^ to that of the 
data compression likelihood of Equation ((9]), where the lat- 
ter equation becomes Equation ((S)) when the compression 
is 1. In the evaluation of the compressed likelihood, three 
terms take up the majority of the computational cost: 

1) C*^^, the evaluation of the (n x n) elements of the co- 
variance matrix of the GWB. 

2) H^C'^^ H, the matrix multiplication to obtain the com- 
pressed covariance matrix. 

inversion of the compressed covariance ma- 
trix. 

All other operations are negligible compared to these three. 
In Figure [3] we present the computational cost of these 
three terms, together with the sum of the three, in the bot- 
tom panel. The uncompressed likelihood is given as a single 
point. We see that the inversion of the compressed covari- 
ance matrix is the dominant term for low compression fac- 
tors: if roughly 70 or more generalised residuals per pulsar 
are kept. For higher compression factors, the evaluation of 
^Gw jg ^j^g most time-consuming part of the evaluation of 
the likelihood. Because this is an operation that does not 
depend on the compression, compressing the data to less 
than 50 generalised residuals per pulsar does not gain us 
any computational efficiency in this configuration. 

4.3 Signals with unknown amplitude 

As explained in the previous section, in the case where a 
dataset is highly compressible, the computational bottleneck 
becomes evaluating C, which contains C'^^ in the example 
of Section 14.21 at each step of the likelihood function. If we 
label the contributions to the compressed covariance ma- 
trix as H^'EiH, then in some cases it is possible to greatly 
accelerate the evaluation of H^YnH. The simplest type of 
stochastic signal is the type where the power spectral den- 
sity shape is known completely, but the amplitude Ni is an 
unknown model parameter. Examples of signals of this type 
include the stochastic behaviour due to TOA uncertainties 

^ All computations in tliis work arc performed on a single work- 
station, code linked witli an Automatically Tuned Linear Algebra 
System (ATLAS) library tliat came witli the GNU/Linux distri- 
bution. 
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Figure 3. The computational cost of the dominating terms in the 
compressed likelihood, as a function of the number of compressed 
generalised residuals per pulsar. An array of 36 pulsars was used, 
with 130 observations per pulsar. The dominating terms are: 

1) , (dashed line), the evaluation of the (n X n) elements 
of the covariance matrix of the GWB. Only present in the lower 
panel. 

2) H^C^^H, (dotted line), the matrix multiplication to ob- 
tain the compressed covariance matrix. Only present in the lower 
panel. 

3) Interpolation, (gray solid line), the construction of the {I X I) 
compressed covariance matrix H^CH by cubic spline interpola- 
tion. Only present in the upper panel. 

4) {H^CH)~^ , (dash-dotted line), inversion of the compressed 
covariance matrix. 

The total computational cost is shown as a solid line, and the un- 
compressed likelihood of Equation JSjl is shown as an upper limit 
at 130 generalised residuals per pulsar. 

In the lower panel, these terms are evaluated for the compressed 
likelihood of Equation without any computational shortcuts. 
For high compression factors (low number of compressed gen- 
eralised residuals), the evaluation of C**-"^ is dominant, which 
means that further compression docs not buy one more compu- 
tational time. 

In the upper panel the compressed likelihood is evaluated, where 
the cubic spline interpolation method of Section 14.41 is used to 
evaluate C^^"^. In this case, the inversion (H^CH) ^ is always 
the dominant term, and data compression is most efficient. Note 
how the line for (H^CH)^^ is (nearly) identical in both panels. 



(with an unkno wn scalinR, or "EFAC", p arameter), pulse 
phase jitter (e.g. ICordes fc Shannonll201Cll ). or a GWB with 
a known spectral index. For these types of signal we can 
evaluate H^YiiH just once for unit amplitude, and store this 
in memory. Then, each time we need to evaluate the likeli- 
hood function, we can multiply this stored matrix with the 
amplitude Ni to obtain the compressed covariance matrix 
without having to re-calculate such matrices every time. Es- 
pecially when I <^ n, this greatly reduces the time necessary 
to evaluate H^T,iH. 



4.4 Power-law signals 

Most stochastic signal models have more free parameters 
than only an amplitude, and the acceleration method of 
Section 14.31 is not applicable. In this section we present a 



practical solution for signals with two free parameters: an 
amplitude, and some other parameter. We focus only on 
signals with a power-law power spectral density, but we ex- 
pect that the method is also appropriate for other signals 
with a parametrised power spectral density. 

Power-law signals are used in various ways in pulsar 
timing, both as a model f or noise sources (i.e. red s pin noise 
ICordes fc Shannonllioiol : [ Shannon fc CordeEllioiol ). and as 
signal sources (i.e. the istotropic background of gravitational 
waves |Ph mne ^ 1200 ll : Ivan Haasteren et al.|[2009l ) . We use the 
following definition for the power spectral density of a power- 
law signal: 



S{f) = a 



2 / 



Vlyr-iy Vlyr 



f f 



(11) 



where / is the signal frequency, a is the signal amplitude, 
and 7 is the spectral index that describes the steepness of the 
spectrum. The rms in the timing residuals of such a signal 
is given by: o"rms = /o°°df S(f). Because this is an unphysi- 
cal power spectrum that diverges at the low frequencies, in 
practice a third parameter is used to describe a power-law 
signal that represents a lowest frequency /l below which 
the signal is assumed to be zero. The reduced data 5t and 
theref ore also the compressed data x are not affected by /l 
(vHL; iBlandford et al.lll984l : iLee et aLllioil ). 

For highly compressed data, the compressed covariance 
matrix H^CH contains far less elements than C: the num- 
ber of unique elements for this matrix is l{l + l)/2. For a 
single pulsar power-law noise covariance matrix this is typi- 
cally only of the order of a hundred elements, depending on 
the number of observations and the compression. We pro- 
pose to use an interpolation approximation for each element 
of the matrix H^CH as a function of 7, with 1 < 7 < 7. 
The elements of the covariance matrix diverge at both ends 
of the interval. In the case of a single pulsar, this means we 
have l{l + l)/2 functions on the interval 1 < 7 < 7 that we 
want to write an interpolation approximation for. We choose 
a cubic spline interpolation method for this, where the do- 
main of the function is divided in sub-intervals in which the 
function is approximated by a third-order polynomial. We 
construct all polynomials such that their values and deriva- 
tives match at the edges. The only free parameter in this 
approach is the number of cubics used in total. This num- 
ber needs to be tuned for performance. 

In Figure |4] we show the difference between the true 
value and the interpolated value of an arbitrary element of 
H'^CH as a function of 7 for J0030-I-0451 of Mock Data 
Challenge open 2. These results are typical; we find a simi- 
lar plot for every element, where the difference between the 
true value and the interpolated value always inflates near the 
boundaries of the interval. We also show the difference be- 
tween the accompanying log-likelihood A as a function of 7 
for the same dataset. Here we also see that the difference in- 
flates near the boundaries. The precision of the interpolation 
depends on the number of cubic splines used in the interpo- 
lation. For lower numbers of splines in the approximation, 
we saw the accuracy quickly decrease near the boundaries. 
This caused the compressed covariance matrix to become 
non-positive deflnite or singular close to the boundaries. In 
our simulations, 100 equally spaced cubic splines was enough 
on a slightly reduced interval 1.09 < 7 < 6.91 to not run into 
numerical issues. 
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Figure 4. The likelihood and the covariance matrix H^CH as a 
function of 7. For the pulsar J0030-I-0451, with data as in IPTA 
Mock Data Challenge open 2, we used the interpolation technique 
of Section 14.41 to approximate the elements of the compressed 
covariance matrix H^CH. In the upper panel, we have plotted 
log\Sx/x\ of element x (row 1, column 4) of the compressed co- 
variance matrix as a function of the spectral index 7. Here x is 
the true value of the element of H^CH, and Sx is the difference 
between the true value of x, and the interpolated value. This plot 
looked similar for all elements. In the bottom panel wc have plot- 
ted the corresponding quantity for the log-likelihood: log\SA/ A\, 
with A the log-likelihood, and 5A the difference between the true 
and interpolated value. We initialised the cubic spline interpo- 
lation with 100 points, evenly distributed on the interval (1,7). 
We see that the discrepancy between the interpolated and the 
true values grows steeply near the boundaries of the interval. At 
the boundaries, the elements of the compressed covariance matrix 
diverge. 

The cubic spline interpolation removes the necessity to 
calculate the total covariance matrix C. In the top panel of 
Figure [3] -we present the computational cost of the compu- 
tationally dominant terms in the compressed likelihood, in 
the case -where we use cubic spline interpolation for the ele- 
ments of H^CH. The computationally dominant term is the 
inversion (H'^CH)-^ for the whole range of possible com- 
pressions, which means that data compression is maximally 
efficient. We almost reached full capacity of Random Ac- 
cess Memory of our workstation for very low compressions. 
For large datasets with an incompressible signal, this may 
cause problems for the cubic spline interpolation method. 
However, for current applications, we don't believe this to 
be an issue. For the Mock Data Challenge, the total typical 
speedup at 99% fidelity is almost three orders of magnitude. 

4.5 Tests on the IPTA Mock Data Challenge 

We test the ABC-method with the cubic spline interpolation 
technique on the open Mock Data Challenge. We present 
the results here of Mock Data Challenge open 1 because 
the noise level was the same for all pulsars in that chal- 
lenge. That makes it easier to compare the results we see 
here with the fidelity levels of Figure [2] they are approxi- 
mately the same for all pulsars. In Figure[2]we see that for a 
compression of 6, we start to approach ^ ~ 0.99. This corre- 



Figure 5. The compressed likelihood as a function of the GWB 
amplitude and spectral index 7 for IPTA Mock Data Challenge 
open one. No parameters are numerically marginalised over. The 
timing model parameters are analytically marginalised over as 
part of the data compression. On the left panel the likelihood is 
plotted for only pulsar J0030-I-0451, with compression to 10 gen- 
eralised residuals (top), and compression to 22 generalised resid- 
uals (bottom). On the right the likelihood is plotted for the full 
array of pulsars, with compression to 6 generalised residuals per 
pulsar (top), and compression to 22 generalised residuals per pul- 
sar (bottom). In each panel, the blue lines represent the credible 
regions of the compressed likelihood, the red lines, labelled "ref, 
represent the reference credible regions of the uncompressed like- 
lihood of Equation ([5}. The contours represent the Itr (68%), 2(T 
(95%), and 3(t (99.7%) credible regions. The injected values are 
marked with an 'x'. 

sponds to 22 compressed generalised residuals per pulsar. In 
Figure [5] we present the likelihood credible regions for Mock 
Data Challenge open 1 both for the full array of pulsars and 
for pulsar J0030-I-0451, with different compression levels. We 
see that with 22 generalised residuals per pulsar, the com- 
pressed likelihood is practically equal to the uncompressed 
likelihood, as predicted by Figure [2] With less than 22 gen- 
eralised residuals per pulsar, the likelihood credible regions 
are broader, with significant covariance between the GWB 
amplitude and the spectral index. This covariance may par- 
tially be a result of the compressed basis being optimal only 
for the injected value of the spectral index 7 = 4.33; this 
dependence is the subject of follow up work. 

The results of this section hold for all three of the open 
Mock Data Challenge datasets: when the fidelity ^ 0.99, 
the likelihood credible regions where almost indistinguish- 
able from the uncompressed likelihood credible regions. 
With a compression such that the fidelity is significantly 
less than that, the credible regions were broader, with a co- 
variance between the amplitude and spectral index. 



5 CONCLUSIONS 

We investigate the acceleration of the analysis of pulsar tim- 
ing data by compressing the data with a linear transforma- 
tion, without losing a significant amount of information of a 
particular stochastic signal of interest: the ABC-method. In 
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this formalism, marginalisation over the timing-model pa- 
rameters is equivalent to lossless linear data compression. 
We show that when the stochastic signal of interest has a 
significantly different spectrum than the noise, the data is 
highly compressible. The ABC-method is most efficient in 
the low-signal limit, where the signal is buried under the 
noise over most of the frequency range. Data compression 
is not possible in the strong-signal limit, where the signal 
dominates the noise in the whole frequency range. The like- 
lihood function of the compressed signal is computationally 
more efficient, and unbiased. 

We introduce the concepts of compression and compres- 
sion fidelity, where the compression is the total number of 
timing residuals divided by the number of generalised timing 
residuals that are kept in the compression, and the fidelity 
is a measure of the amount of information about the signal 
of interest that is kept in the compression. For the IPTA 
Mock Data Challenge, we show that the compression is of 
the order of 10, at a fidelity ^ = 0.99, if one is interested in 
the isotropic stochastic background of gravitational waves. 

When applied to highly compressible datasets, compu- 
tational shortcuts are required to optimally accelerate the 
evaluation of the compressed likelihood. We present an prac- 
tical method based on cubic spline interpolation of the com- 
pressed covariance matrix. When this interpolation approxi- 
mation is used, the total acceleration of the evaluation of the 
compressed likelihood is c'', with c the compression. We test 
the cubic spline interpolation method, and conclude that it 
works well for the purposes of the IPTA Mock Data Chal- 
lenge. The total acceleration is about three orders of mag- 
nitude for a compression of 10, with results almost identical 
to an analysis without the ABC-method. 

The ABC-method can be readily applied to realistic 
datasets, without any adjustments. Realistic datasets of cur- 
rent Pulsar Timing Arrays are expected to reside in the 
low-signal approximation: no stochastic gravitational-wave 
background has been detected as of yet. Therefore, a high 
compression factor of several hundred is realistic for such 
datasets, which yields a total acceleration of over six orders 
of magnitude. We expect linear data compression to become 
one of the key solutions for the issues related to computa- 
tional cost in pulsar timing array data analysis. 
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